This analysis performs differential gene expression analysis on RNA-seq data from maize leaf samples across multiple experimental factors:
The analysis uses the limma-voom pipeline to identify genes that respond to:
This model describes the log-transformed expression \(Y_{ijrs}\) of gene \(g\) in sample from leaf stage \(s\), plant \(r\), under phosphorus treatment \(i\), with Inv4m genotype \(j\). Each gene is fitted separately using a common design matrix, producing gene-specific regression coefficients.
\[ \begin{aligned} Y_{ijrs} = {}& \beta_0 + \beta_1 \text{Leaf}_s + \beta_2 \text{P}_i + \beta_3 \text{Inv4m}_j \\ & + \beta_4 [\text{Leaf} \times \text{P}]_{si} + \beta_5 [\text{P} \times \text{Inv4m}]_{ij} \\ & + \beta_6 [\text{Leaf} \times \text{Inv4m}]_{sj} + u_r + \varepsilon_{ijrs} \end{aligned} \]
with random effect and residuals:
\[ u_r \sim \mathcal{N}(0, \sigma^2_u), \quad \varepsilon_{ijrs} \sim \mathcal{N}(0, \phi_{ijrs}\sigma^2) \]
| Symbol | Description | Varies Over | Notes |
|---|---|---|---|
| \(Y_{ijrs}\) | \(\log_2\)(CPM) expression in sample | samples | Response variable |
| \(\beta_0\) | Intercept (mean expression at centered leaf stage, +P, reference genotype) | genes | Baseline expression |
| \(\beta_1\) | Slope for leaf developmental stage (centered) | genes | Linear change with leaf age |
| \(\beta_2\) | Effect of phosphorus treatment (−P vs. +P) | genes | Treatment main effect |
| \(\beta_3\) | Effect of Inv4m genotype | genes | Genotype main effect |
| \(\beta_4\) | Interaction: leaf stage × treatment | genes | Tests if developmental slope differs under P deficiency |
| \(\beta_5\) | Interaction: phosphorus × Inv4m | genes | Tests if Inv4m modifies P response |
| \(\beta_6\) | Interaction: leaf stage × Inv4m | genes | Tests if developmental slope depends on Inv4m |
| \(u_r\) | Random effect of plant | samples (within plant) | Accounts for within-plant correlation among leaves |
| \(\phi_{ijrs}\) | Voom-derived precision weights | samples | Captures heteroskedastic measurement error |
| \(\varepsilon_{ijrs}\) | Residual error | samples | Random error term |
Subscript indexing: - \(i\): Treatment level (+P or −P) - \(j\): Genotype (Inv4m present or absent) - \(r\): Plant replicate (1-4 per treatment × genotype) - \(s\): Leaf developmental stage (continuous, centered)
leaf_c (so the intercept represents average expression at
mean leaf stage).Genes are classified as:
in_Inv4m: Within the inversion properin_cis: Within the shared introgression (includes
inversion + flanking)in_trans: Outside the shared introgression regionThree levels of stringency for different analyses:
is_DEG).is_hiconf_DEG).is_selected_DEG).library(edgeR)
library(limma)
library(rtracklayer)
library(GenomicRanges)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(ggpubr)
library(ggfx)
library(ggtext)
library(robustbase)
counts <- read.csv(file.path(paths$data, "inv4mRNAseq_gene_sample_exp.csv"))
{
cat("Loaded expression\n")
cat(" Dimensions:", dim(counts), "\n")
cat(" Genes:", nrow(counts), "\n")
cat(" Samples:", ncol(counts) - 2, "\n")
}
## Loaded expression
## Dimensions: 39756 62
## Genes: 39756
## Samples: 60
sampleInfo <- read.csv(file.path(paths$data, "PSU2022_metadata.csv"))
# Recode Treatment: High_P -> +P, Low_P -> -P
sampleInfo$Treatment <- factor(sampleInfo$Treatment,
levels = c("High_P", "Low_P"),
labels = c("+P", "-P"))
# Recode Genotype: CTRL -> CTRL, INV4m -> Inv4m
sampleInfo$Genotype <- factor(sampleInfo$Genotype,
levels = c("CTRL", "INV4m"),
labels = c("CTRL", "Inv4m"))
# RNA_Plant already exists in new metadata (actual plant field IDs)
{
cat("\nSample meta\n")
cat(" Total samples:", nrow(sampleInfo), "\n")
cat(" Genotypes:", paste(unique(sampleInfo$Genotype), collapse = ", "), "\n")
cat(" Treatments:", paste(unique(sampleInfo$Treatment), collapse = ", "), "\n")
}
##
## Sample meta
## Total samples: 64
## Genotypes: CTRL, Inv4m
## Treatments: -P, +P
# Extract gene IDs
gene_ids <- data.frame(gene = counts[, 2])
# Convert to matrix and set row names
counts <- as.matrix(counts[, -c(1:2)])
rownames(counts) <- gene_ids$gene
# Use library column for sample matching (side_tag has Excel errors)
sampleNames <- colnames(counts)
# Reorder metadata to match count matrix column order
sampleInfo <- sampleInfo[match(sampleNames, sampleInfo$library), ]
{
cat("\nAll samples in metadata:",
all(sampleNames %in% sampleInfo$library), "\n")
cat("Count matrix prepared:\n")
cat(" Genes:", nrow(counts), "\n")
cat(" Samples:", ncol(counts), "\n")
}
##
## All samples in metadata: TRUE
## Count matrix prepared:
## Genes: 39756
## Samples: 60
# Create DGEList with counts and sample information
y <- DGEList(counts = counts, samples = sampleInfo)
# Define groups from Treatment and Genotype interaction
y$group <- interaction(y$samples$Treatment, y$samples$Genotype)
cat("\nDGEList object created\n")
##
## DGEList object created
head(y$samples)
## group lib.size norm.factors library top_tag rna_ctn rna_batch1
## L01_S1_L002 1 327979 1 L01_S1_L002 L01 49 1
## L02_S2_L002 1 798505 1 L02_S2_L002 L02 101 1
## L03_S3_L002 1 2398804 1 L03_S3_L002 L03 41 1
## L04_S4_L002 1 2476414 1 L04_S4_L002 L04 94 1
## L05_S5_L002 1 31396360 1 L05_S5_L002 L05 42 1
## L06_S6_L002 1 30520111 1 L06_S6_L002 L06 63 1
## tube_n side NCSU_RNA_plant Treatment Genotype leaf_tissue leaf_node
## L01_S1_L002 1 L 1 -P CTRL 1 1
## L02_S2_L002 2 L 1 -P CTRL 2 3
## L03_S3_L002 3 L 1 -P CTRL 3 5
## L04_S4_L002 4 L 1 -P CTRL 4 7
## L05_S5_L002 5 L 2 -P Inv4m 1 1
## L06_S6_L002 6 L 2 -P Inv4m 2 3
## side_tag TIME decimal_time row COLLECTOR NOTE S Plot
## L01_S1_L002 -P1_B73_L1L 12H 37M 0S 12.61667 3056 SERGIO 1 PlotVI
## L02_S2_L002 -P1_B73_L2L 12H 42M 0S 12.70000 3056 SERGIO 2 PlotVI
## L03_S3_L002 -P1_B73_L3L 12H 43M 0S 12.71667 3056 SERGIO 3 PlotVI
## L04_S4_L002 -P1_B73_L4L 12H 44M 0S 12.73333 3056 SERGIO 4 PlotVI
## L05_S5_L002 -P1MICH_L1L 12H 46M 0S 12.76667 3059 SERGIO NA PlotVI
## L06_S6_L002 -P1MICH_L2L 12H 48M 0S 12.80000 3059 SERGIO NA PlotVI
## Rep Plot_Row Plot_Column std_count_06102022 RNA_Q borde notes
## L01_S1_L002 6 3 3 4 3 variacion NA
## L02_S2_L002 6 3 3 4 3 variacion NA
## L03_S3_L002 6 3 3 4 3 variacion NA
## L04_S4_L002 6 3 3 4 3 variacion NA
## L05_S5_L002 6 4 3 4 5 NA
## L06_S6_L002 6 4 3 4 5 NA
## leaves_per_plant RNA_Plant
## L01_S1_L002 7 26
## L02_S2_L002 7 26
## L03_S3_L002 7 26
## L04_S4_L002 7 26
## L05_S5_L002 8 16
## L06_S6_L002 8 16
Using filterByExpr to remove genes with low counts
across samples.
# Keep genes with sufficient expression
keep <- filterByExpr(y, group = y$group)
y_filtered <- y[keep, ]
{
cat("\nExpression filtering:\n")
cat(" Genes before:", nrow(y), "\n")
cat(" Genes after:", nrow(y_filtered), "\n")
cat(" Genes removed:", sum(!keep), "\n")
}
##
## Expression filtering:
## Genes before: 39756
## Genes after: 24249
## Genes removed: 15507
Compute MDS on all libraries (after gene filtering but before sample filtering) to identify low-quality samples based on library size.
# MDS with all samples (before library size filtering)
mds_all <- plotMDS(y_filtered, pch = 21, plot = TRUE)
# Histogram of library sizes
hist(
y$samples$lib.size / 1e6,
main = "Library Size Distribution",
xlab = "Library Size (millions of reads)",
breaks = 20
)
{
cat("\nLibrary size summary:\n")
print(summary(y$samples$lib.size))
cat("\nSamples with lib.size < 20 million:",
sum(y$samples$lib.size < 2e7), "\n")
}
##
## Library size summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 327979 10716231 31446964 27205445 36129276 56805083
##
## Samples with lib.size < 20 million: 17
# Prepare data for plotting
mds_qc_data <- y_filtered$samples %>%
mutate(
dim1 = mds_all$x,
dim2 = mds_all$y,
lib_size_millions = lib.size / 1e6
)
# Plot MDS colored by library size
ggplot(mds_qc_data, aes(x = dim1, y = dim2, color = lib_size_millions)) +
geom_point(size = 3) +
scale_color_viridis_c(option = "plasma") +
labs(
x = paste0("Dim1 (", round(100 * mds_all$var.explained[1]), "%)"),
y = paste0("Dim2 (", round(100 * mds_all$var.explained[2]), "%)"),
color = "Library Size\n(millions)"
) +
theme_classic2(base_size = 15)
Samples with library size < 20 million reads are considered low quality.
# Flag low count libraries
y_filtered$samples$lowCount <- y_filtered$samples$lib.size < 2e7
# Remove low quality samples
y_filtered_bySample <- y_filtered[, !y_filtered$samples$lowCount]
{
cat("\nLow quality libraries:\n")
print(table(y_filtered$samples$lowCount))
cat("\nSamples after filtering:\n")
cat(" Retained:", ncol(y_filtered_bySample), "\n")
cat(" Removed:", sum(y_filtered$samples$lowCount), "\n")
}
##
## Low quality libraries:
##
## FALSE TRUE
## 43 17
##
## Samples after filtering:
## Retained: 43
## Removed: 17
Verify experimental design balance across factors after quality filtering.
with(y_filtered_bySample$samples, {
cat("\n=== Sample Distribution by Factors ===\n")
cat("\n-- Treatment --\n")
print(table(Treatment))
cat("\n-- Genotype --\n")
print(table(Genotype))
cat("\n-- Leaf Tissue --\n")
print(table(leaf_tissue))
cat("\n-- Treatment × Leaf Tissue --\n")
print(table(Treatment, leaf_tissue))
cat("\n-- Genotype × Leaf Tissue --\n")
print(table(Genotype, leaf_tissue))
cat("\n-- Treatment × Genotype × Leaf Tissue (3-way) --\n")
print(table(Treatment, Genotype, leaf_tissue))
})
##
## === Sample Distribution by Factors ===
##
## -- Treatment --
## Treatment
## +P -P
## 24 19
##
## -- Genotype --
## Genotype
## CTRL Inv4m
## 20 23
##
## -- Leaf Tissue --
## leaf_tissue
## 1 2 3 4
## 11 11 11 10
##
## -- Treatment × Leaf Tissue --
## leaf_tissue
## Treatment 1 2 3 4
## +P 6 6 6 6
## -P 5 5 5 4
##
## -- Genotype × Leaf Tissue --
## leaf_tissue
## Genotype 1 2 3 4
## CTRL 4 6 5 5
## Inv4m 7 5 6 5
##
## -- Treatment × Genotype × Leaf Tissue (3-way) --
## , , leaf_tissue = 1
##
## Genotype
## Treatment CTRL Inv4m
## +P 3 3
## -P 1 4
##
## , , leaf_tissue = 2
##
## Genotype
## Treatment CTRL Inv4m
## +P 3 3
## -P 3 2
##
## , , leaf_tissue = 3
##
## Genotype
## Treatment CTRL Inv4m
## +P 3 3
## -P 2 3
##
## , , leaf_tissue = 4
##
## Genotype
## Treatment CTRL Inv4m
## +P 3 3
## -P 2 2
MDS reveals sources of variation in gene expression across samples. Dimensions 3 and 4 are extracted from eigenvectors scaled by square root of variance explained.
mds <- plotMDS(
y_filtered_bySample,
pch = 21,
label = y_filtered_bySample$samples$library,
plot = FALSE
)
# Store MDS coordinates in sample data
d <- y_filtered_bySample$samples
d$dim1 <- mds$x
d$dim2 <- mds$y
d$dim3 <- mds$eigen.vectors[, 3] * sqrt(mds$var.explained[3])
d$dim4 <- mds$eigen.vectors[, 4] * sqrt(mds$var.explained[4])
# Prepare factors for plotting
d$Treatment <- factor(d$Treatment)
d$Genotype <- factor(d$Genotype)
d$RNA_Plant <- factor(d$RNA_Plant)
# Variance explained by each dimension
tibble(
dimension = paste("Dim", 1:4),
var_explained = round(mds$var.explained[1:4], 4)
)
## # A tibble: 4 × 2
## dimension var_explained
## <chr> <dbl>
## 1 Dim 1 0.265
## 2 Dim 2 0.125
## 3 Dim 3 0.0842
## 4 Dim 4 0.0724
Exploring which experimental factors drive the main dimensions of variation.
# Define custom green to orange palette for leaf positions
green_to_orange <- c(
"#00954A",
"#A4DF56",
"#D7E23C",
"#FF9A1F"
)
# Treatment
p1 <- ggplot(d, aes(x = dim1, y = dim2, color = Treatment)) +
geom_point(size = 3) +
theme_classic(base_size = 15) +
theme(legend.position = "top")
# Row position
p2 <- ggplot(d, aes(x = dim1, y = dim2, color = row)) +
geom_point(size = 3) +
guides(
color = guide_colourbar(
title = "Row Position",
direction = "horizontal",
barwidth = 15,
barheight = 1,
title.position = "top",
title.hjust = 0.5
),
shape = "none"
) +
theme_classic(base_size = 15) +
theme(legend.position = "top")
# Collection time
p3 <- ggplot(d, aes(x = dim1, y = dim2, color = decimal_time)) +
geom_point(size = 3) +
guides(
color = guide_colourbar(
title = "Collection Time",
direction = "horizontal",
barwidth = 15,
barheight = 1,
title.position = "top",
title.hjust = 0.5
),
shape = "none"
) +
theme_classic(base_size = 15) +
theme(legend.position = "top")
# Collector
p4 <- ggplot(d, aes(x = dim1, y = dim2, color = COLLECTOR)) +
geom_point(size = 3) +
scale_color_discrete(labels = c("Team 1", "Team 2")) +
theme_classic(base_size = 15) +
theme(legend.position = "top") +
labs(color = "Collector")
# Genotype
p5 <- ggplot(d, aes(x = dim1, y = dim2, color = Genotype)) +
geom_point(size = 3) +
theme_classic(base_size = 15) +
theme(legend.position = "top") +
labs(color = "Genotype")
# Leaf tissue
p6 <- d %>%
mutate(leaf = factor(leaf_tissue)) %>%
ggplot(aes(x = dim1, y = dim2, color = leaf)) +
scale_color_manual(values = green_to_orange) +
geom_point(size = 3) +
theme_classic(base_size = 15) +
theme(legend.position = "top") +
labs(color = "Leaf")
ggarrange(p1, p2, p3, p4, p5, p6, ncol = 3, nrow = 2)
transcript_MDS_12 <- d %>%
mutate(leaf = factor(leaf_tissue)) %>%
ggplot(aes(x = dim1, y = dim2)) +
ggtitle("MDS Transcripts") +
xlab(paste0("dim1 (", round(100 * mds$var.explained[1]), "%)")) +
ylab(paste0("dim2 (", round(100 * mds$var.explained[2]), "%)")) +
geom_point(aes(fill = leaf, shape = Treatment), size = 4) +
scale_fill_manual(values = green_to_orange) +
scale_shape_manual(values = c(21, 25)) +
guides(
shape = guide_legend(
title = element_blank(),
order = 2,
override.aes = list(fill = "black"),
reverse = TRUE
),
fill = guide_legend(
title = "Leaf",
order = 1,
override.aes = list(geom = "point", shape = 22, size = 7)
)
) +
theme_classic2(base_size = 30) +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.spacing = unit(0, "line"),
legend.box.spacing = unit(0, "in"),
legend.background = element_rect(fill = "transparent")
)
saveRDS(transcript_MDS_12, file.path(paths$intermediate, "transcript_MDS_12.rds"))
transcript_MDS_12
The third dimension separates by genotype.
# Set up genotype labels with italic formatting
labels <- c("CTRL", "*Inv4m*")
names(labels) <- c("CTRL", "INV4")
d %>%
ggplot(aes(x = dim3, y = dim4, fill = Genotype, shape = Treatment)) +
xlab(paste0("dim3 (", round(100 * mds$var.explained[3]), "%)")) +
ylab(paste0("dim4 (", round(100 * mds$var.explained[4]), "%)")) +
geom_point(size = 4) +
scale_fill_viridis_d(direction = -1, labels = labels) +
scale_shape_manual(values = c(24, 21)) +
guides(
shape = "none",
fill = guide_legend(
title = "Genotype",
order = 2,
override.aes = list(geom = "point", shape = 22, size = 7, reverse = TRUE)
)
) +
theme_classic2(base_size = 25) +
theme(
legend.position = c(0.89, 0.9),
legend.text = element_markdown(),
legend.spacing = unit(0, "line"),
legend.box.spacing = unit(0, "line")
)
# Calculate correlations and p-values
mds_cor_results <- tibble(
dimension = c("Dim1", "Dim2", "Dim3"),
factor = c("Treatment", "Leaf tissue", "Genotype"),
correlation = c(
cor(d$dim1, as.numeric(d$Treatment)),
cor(d$dim2, d$leaf_tissue),
cor(d$dim3, as.numeric(d$Genotype))
),
p_value = c(
cor.test(d$dim1, as.numeric(d$Treatment))$p.value,
cor.test(d$dim2, d$leaf_tissue)$p.value,
cor.test(d$dim3, as.numeric(d$Genotype))$p.value
)
) %>%
mutate(
adj_p_value = p.adjust(p_value, method = "fdr"),
correlation = round(correlation, 3),
p_value = signif(p_value, 3),
adj_p_value = signif(adj_p_value, 3)
)
mds_cor_results
## # A tibble: 3 × 5
## dimension factor correlation p_value adj_p_value
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Dim1 Treatment 0.501 6.15e- 4 6.15e- 4
## 2 Dim2 Leaf tissue -0.615 1.13e- 5 1.69e- 5
## 3 Dim3 Genotype 0.916 6.88e-18 2.06e-17
Apply TMM (Trimmed Mean of M-values) normalization to account for
sequencing depth differences, then fit a linear model with plant
blocking to account for within-plant correlation. The model includes
biological factors (Genotype, leaf_tissue,
Treatment) and their interactions. Voom transformation
estimates mean-variance relationships and computes precision weights for
each observation.
# Center the continuous leaf stage variable
# Centering subtracts the mean leaf age so that:
# - the intercept represents expression at the *average leaf age*,
# - the Treatment coefficient corresponds to the treatment effect at mean leaf stage,
# - and collinearity between leaf age and Treatment is minimized.
d$leaf_tissue_c <- scale(d$leaf_tissue, center = TRUE, scale = FALSE)
# Construct Plant_ID for blocking
# Derive PlantRep within each Treatment-Genotype combination
d <- d %>%
group_by(Treatment, Genotype) %>%
mutate(PlantRep = match(RNA_Plant, unique(RNA_Plant))) %>%
ungroup()
# Create Plant_ID (unique identifier for each plant)
d$Plant_ID <- factor(paste(d$Treatment, d$Genotype, d$PlantRep, sep = "_"))
cat("Plant blocking structure:\n")
## Plant blocking structure:
cat(" Unique plants:", length(unique(d$Plant_ID)), "\n")
## Unique plants: 13
print(table(d$Plant_ID))
##
## -P_CTRL_1 -P_CTRL_2 -P_CTRL_3 -P_Inv4m_1 -P_Inv4m_2 -P_Inv4m_3 -P_Inv4m_4
## 4 1 3 4 2 1 4
## +P_CTRL_1 +P_CTRL_2 +P_CTRL_3 +P_Inv4m_1 +P_Inv4m_2 +P_Inv4m_3
## 4 4 4 4 4 4
# Design matrix: biological factors + interactions (NO Plot_Row)
design <- model.matrix(
~ leaf_tissue_c * Treatment + Genotype * Treatment + Genotype * leaf_tissue_c,
d
)
# Voom transformation with precision weights
voomR <- voom(y_filtered_bySample, design = design, plot = FALSE)
# Save normalized expression for downstream analyses
saveRDS(voomR$E, file = file.path(paths$intermediate, "normalized_expression_logCPM_leaf_trt.rds"))
saveRDS(voomR, file = file.path(paths$intermediate, "normalized_expression_voom_object_leaf_trt.rds"))
{
cat("Normalization factors range:",
range(y_filtered_bySample$samples$norm.factors), "\n")
cat("Design matrix:", nrow(design), "samples ×", ncol(design),
"coefficients\n")
design %>% as.data.frame() %>% tibble() %>% print()
cat("Voom expression matrix:", nrow(voomR$E), "genes ×",
ncol(voomR$E), "samples\n")
}
## Normalization factors range: 1 1
## Design matrix: 43 samples × 7 coefficients
## # A tibble: 43 × 7
## `(Intercept)` leaf_tissue_c `Treatment-P` GenotypeInv4m
## <dbl> <dbl> <dbl> <dbl>
## 1 1 -1.47 1 1
## 2 1 -0.465 1 1
## 3 1 0.535 1 1
## 4 1 1.53 1 1
## 5 1 -1.47 1 1
## 6 1 0.535 1 1
## 7 1 -1.47 1 0
## 8 1 -0.465 1 0
## 9 1 0.535 1 0
## 10 1 1.53 1 0
## # ℹ 33 more rows
## # ℹ 3 more variables: `leaf_tissue_c:Treatment-P` <dbl>,
## # `Treatment-P:GenotypeInv4m` <dbl>, `leaf_tissue_c:GenotypeInv4m` <dbl>
## Voom expression matrix: 24249 genes × 43 samples
Fit linear model to voom-transformed data with plant blocking to account for within-plant correlation, then apply robust empirical Bayes moderation to stabilize variance estimates and improve power for differential expression testing.
# Estimate within-plant correlation
corfit <- duplicateCorrelation(voomR, design, block = d$Plant_ID)
cat("Plant blocking correlation:\n")
## Plant blocking correlation:
cat(" Consensus correlation:", round(corfit$consensus.correlation, 4), "\n")
## Consensus correlation: 0.1412
# Fit linear model with plant blocking
fit <- lmFit(voomR, design, block = d$Plant_ID, correlation = corfit$consensus.correlation)
# Apply empirical Bayes moderation
ebfit <- eBayes(fit)
{
cat("Model fitted:", nrow(fit$coefficients), "genes ×",
ncol(fit$coefficients), "coefficients\n")
cat("\nSignificant genes per coefficient (FDR < 0.05):\n")
print(colSums(abs(decideTests(ebfit))))
}
## Model fitted: 24249 genes × 7 coefficients
##
## Significant genes per coefficient (FDR < 0.05):
## (Intercept) leaf_tissue_c
## 22209 8177
## Treatment-P GenotypeInv4m
## 6531 466
## leaf_tissue_c:Treatment-P Treatment-P:GenotypeInv4m
## 3241 0
## leaf_tissue_c:GenotypeInv4m
## 1
We focus on biological effects with plant blocking to account for within-plant correlation.
# Define predictors of interest with ORIGINAL names from model
predictors_original <- c(
"leaf_tissue_c",
"Treatment-P",
"GenotypeINV4",
"Treatment-P:GenotypeInv4m",
"leaf_tissue_c:Treatment-P",
"leaf_tissue_c:GenotypeInv4m"
)
topTable(ebfit) %>% colnames()
## [1] "leaf_tissue_c" "Treatment.P"
## [3] "GenotypeInv4m" "leaf_tissue_c.Treatment.P"
## [5] "Treatment.P.GenotypeInv4m" "leaf_tissue_c.GenotypeInv4m"
## [7] "AveExpr" "F"
## [9] "P.Value" "adj.P.Val"
predictors_toptable <- c(
"leaf_tissue_c",
"Treatment.P",
"GenotypeINV4",
"Treatment.P.GenotypeInv4m",
"leaf_tissue_c.Treatment.P",
"leaf_tissue_c.GenotypeInv4m"
)
# Define STANDARDIZED names for output
predictors_standard <- c(
"Leaf",
"-P",
"Inv4m",
"Leaf:-P",
"Leaf:-P",
"Leaf:Inv4m"
)
# Create mapping
predictor_map <- setNames(predictors_standard, predictors_original)
{
cat("\nExtracting coefficients:\n")
for (i in seq_along(predictors_original)) {
cat(" ", predictors_original[i], "→", predictors_standard[i], "\n")
}
}
##
## Extracting coefficients:
## leaf_tissue_c → Leaf
## Treatment-P → -P
## GenotypeINV4 → Inv4m
## Treatment-P:GenotypeInv4m → Leaf:-P
## leaf_tissue_c:Treatment-P → Leaf:-P
## leaf_tissue_c:GenotypeInv4m → Leaf:Inv4m
For each predictor, extract results and calculate 95% confidence intervals.
#' Extract differential expression results for specified predictors
#'
#' @param ebfit An eBayes fitted model object from limma
#' @param predictor_map Named vector mapping coefficient names to display names
#'
#' @return A data frame with DE results for all specified predictors
extract_predictor_effects <- function(ebfit, predictor_map) {
# Get coefficient names from the model
coef_names <- colnames(coef(ebfit))
# Match predictor names to coefficient positions
coef_indices <- match(names(predictor_map), coef_names)
# Validate all predictors found
if (any(is.na(coef_indices))) {
missing <- names(predictor_map)[is.na(coef_indices)]
stop(
"Coefficients not found: ", paste(missing, collapse = ", "),
"\nAvailable: ", paste(coef_names, collapse = ", ")
)
}
# Extract results for each predictor
result_list <- lapply(seq_along(coef_indices), function(i) {
idx <- coef_indices[i]
tt <- topTable(ebfit, coef = idx, sort.by = "none", n = Inf)
# Calculate 95% confidence intervals
crit_value <- qt(0.975, ebfit$df.residual + ebfit$df.prior)
std_errors <- ebfit$stdev.unscaled[, idx] * sqrt(ebfit$s2.post)
tt$std_err <- std_errors
tt$upper <- tt$logFC + crit_value * std_errors
tt$lower <- tt$logFC - crit_value * std_errors
tt$predictor <- predictor_map[i]
tt$response <- rownames(tt)
tt
})
# Combine and format
effects <- do.call(rbind, result_list)
rownames(effects) <- NULL
effects %>%
dplyr::select(predictor, response, everything()) %>%
arrange(adj.P.Val)
}
# Map coefficients (use names from coef(), not topTable())
predictor_map <- c(
"leaf_tissue_c" = "Leaf",
"Treatment-P" = "-P",
"GenotypeInv4m" = "Inv4m",
"Treatment-P:GenotypeInv4m" = "Inv4m:-P",
"leaf_tissue_c:Treatment-P" = "Leaf:-P",
"leaf_tissue_c:GenotypeInv4m" = "Leaf:Inv4m"
)
# Verify coefficient names match
{
cat("Available coefficients:\n")
print(colnames(coef(ebfit)))
}
## Available coefficients:
## [1] "(Intercept)" "leaf_tissue_c"
## [3] "Treatment-P" "GenotypeInv4m"
## [5] "leaf_tissue_c:Treatment-P" "Treatment-P:GenotypeInv4m"
## [7] "leaf_tissue_c:GenotypeInv4m"
Combine all predictor results and annotate with gene information.
# Define factor level order for predictors
effect_order <- c("Leaf", "-P", "Leaf:-P", "Inv4m", "Inv4m:-P", "Leaf:Inv4m")
effects_df <- extract_predictor_effects(ebfit, predictor_map) %>%
dplyr::rename(gene = "response") %>%
mutate(predictor = factor(predictor, levels = effect_order)) %>%
mutate(neglogP = -log10(adj.P.Val))
# Add DEG and regulation flags
effects_df <- effects_df %>%
mutate(is_DEG = adj.P.Val < 0.05) %>%
mutate(
regulation = case_when(
is_DEG & logFC > 0 ~ "Upregulated",
is_DEG & logFC < 0 ~ "Downregulated",
.default = "Unregulated"
)
)
{
cat("\nTotal tests:", nrow(effects_df), "\n")
cat("Tests per predictor:\n")
print(table(effects_df$predictor))
cat("\nSignificant per predictor (FDR < 0.05):\n")
print(table(effects_df$predictor[effects_df$adj.P.Val < 0.05]))
cat("\nCombined effects table created:\n")
print(with(effects_df, table(predictor, regulation)))
}
##
## Total tests: 145494
## Tests per predictor:
##
## Leaf -P Leaf:-P Inv4m Inv4m:-P Leaf:Inv4m
## 24249 24249 24249 24249 24249 24249
##
## Significant per predictor (FDR < 0.05):
##
## Leaf -P Leaf:-P Inv4m Inv4m:-P Leaf:Inv4m
## 8177 6531 3241 466 0 1
##
## Combined effects table created:
## regulation
## predictor Downregulated Unregulated Upregulated
## Leaf 3126 16072 5051
## -P 4146 17718 2385
## Leaf:-P 1406 21008 1835
## Inv4m 278 23783 188
## Inv4m:-P 0 24249 0
## Leaf:Inv4m 1 24248 0
Identifies extreme differentially expressed genes using robust Mahalanobis distance based on the Minimum Covariance Determinant (MCD) method. This approach is resistant to the influence of outliers themselves, providing more reliable outlier detection than classical methods.
The MCD method estimates location and covariance using only the most
central 75% of observations (alpha = 0.75), making it
robust to contamination.
#' Calculate robust Mahalanobis distance for one predictor
#'
#' @param per_predictor Data frame for a single predictor
#' @param mcd_alpha Alpha parameter for MCD estimation
#'
#' @return Data frame with mahalanobis column added
calculate_robust_distance <- function(per_predictor, mcd_alpha) {
# Extract bivariate data (logFC and -log10(FDR))
bivariate <- per_predictor %>%
dplyr::select(logFC, neglogP) %>%
as.matrix()
# Compute robust location and covariance using MCD
mcd_result <- covMcd(bivariate, alpha = mcd_alpha)
# Calculate robust Mahalanobis distances
per_predictor$mahalanobis <- mahalanobis(
x = bivariate,
center = mcd_result$center,
cov = mcd_result$cov
)
per_predictor
}
#' Add robust Mahalanobis outlier flags
#'
#' @param data Data frame with effects
#' @param distance_quantile Quantile threshold for outlier detection
#' @param FDR FDR threshold for significance
#' @param mcd_alpha Alpha parameter for MCD estimation
#'
#' @return Data frame with mahalanobis and is_mh_outlier columns
add_mahalanobis_outliers <- function(
data = NULL,
distance_quantile = 0.05,
FDR = 0.05,
mcd_alpha = 0.75
) {
# Calculate robust Mahalanobis distance per predictor
data <- split(data, factor(data$predictor)) %>%
lapply(calculate_robust_distance, mcd_alpha = mcd_alpha) %>%
bind_rows()
# Chi-square cutoff for bivariate data (df = 2)
cutoff <- qchisq(p = 1 - distance_quantile, df = 2)
# Flag outliers: significant AND extreme distance
data$is_mh_outlier <- (data$adj.P.Val < FDR) & (data$mahalanobis > cutoff)
# Sort by distance within groups
data %>%
ungroup() %>%
group_by(predictor, regulation) %>%
arrange(-mahalanobis, .by_group = TRUE) %>%
ungroup()
}
effects_df <- add_mahalanobis_outliers(
effects_df,
distance_quantile = 0.05,
FDR = 0.05
)
{
cat("\nMahalanobis outliers detected:\n")
print(with(effects_df, table(predictor, is_mh_outlier)))
}
##
## Mahalanobis outliers detected:
## is_mh_outlier
## predictor FALSE TRUE
## Leaf 19388 4861
## -P 19542 4707
## Leaf:-P 21008 3241
## Inv4m 23783 466
## Inv4m:-P 24249 0
## Leaf:Inv4m 24248 1
Load gene symbols, functional descriptions (Pannzer), and genomic coordinates (B73 v5 GFF3). Gene IDs are cleaned and coordinates imported as both GRanges (for overlap operations) and data.frame (for dplyr filtering).
# Gene symbols and locus names
gene_symbol <- read.table(
file.path(paths$data, "gene_symbol.tab"),
quote = "",
header = TRUE,
sep = "\t",
na.strings = ""
)
# Pannzer functional annotations
pannzer <- read.table(
file.path(paths$data, "PANNZER_DESC.tab"),
quote = "",
header = TRUE,
sep = "\t"
) %>%
group_by(gene_model) %>%
dplyr::slice(1) %>%
dplyr::select(gene_model, desc)
# Create unique gene symbol table
gene_symbol_unique <- gene_symbol %>%
group_by(gene_model, locus_symbol) %>%
dplyr::slice(1) %>%
ungroup()
# Merge annotations
gene_pannzer <- gene_symbol_unique %>%
left_join(pannzer, by = c("gene_model" = "gene_model")) %>%
group_by(gene_model) %>%
dplyr::slice(1) %>%
ungroup() %>%
dplyr::select(gene_model, locus_symbol, locus_name, desc)
# Genomic coordinates (GRanges + data.frame)
v5_gff_file <- file.path(paths$data, "Zea_mays.Zm-B73-REFERENCE-NAM-5.0.59.chr.gff3")
genes_gr <- rtracklayer::import(v5_gff_file) %>%
subset(type == "gene" & seqnames %in% 1:10)
genes <- as.data.frame(genes_gr)
genes$ID <- gsub("gene:", "", genes$ID)
{
cat("Annotations loaded:\n")
cat(" Gene symbols:", nrow(gene_symbol), "\n")
cat(" Pannzer descriptions:", nrow(pannzer), "\n")
cat(" Merged annotations:", nrow(gene_pannzer), "\n")
cat(" Genomic features:", nrow(genes), "genes across",
length(unique(genes$seqnames)), "chromosomes\n")
}
## Annotations loaded:
## Gene symbols: 44364
## Pannzer descriptions: 28308
## Merged annotations: 44303
## Genomic features: 43459 genes across 10 chromosomes
Define three nested regions on chromosome 4: (1) Inv4m inversion proper (gene-defined boundaries), (2) shared introgression including flanking regions (manually verified from genotyping data), and (3) flanking regions (in introgression but outside inversion).
# Inv4m inversion boundaries (defined by specific genes)
inv4m_start <- genes[genes$ID == "Zm00001eb190470", "start"]
inv4m_end <- genes[genes$ID == "Zm00001eb194800", "end"]
# Shared introgression boundaries (from RNAseq genotype verification)
introgression_start <- 157012149
introgression_end <- 195900523
# Extract gene IDs for each region
inv4m_gene_ids <- genes %>%
filter(seqnames == 4, start >= inv4m_start, end <= inv4m_end) %>%
pull(ID)
shared_introgression_gene_ids <- genes %>%
filter(seqnames == 4, start >= introgression_start, end <= introgression_end) %>%
pull(ID)
flanking_introgression_gene_ids <- shared_introgression_gene_ids[
!(shared_introgression_gene_ids %in% inv4m_gene_ids)
]
{
cat("Inv4m inversion: Chr4:", inv4m_start, "-", inv4m_end,
"(", length(inv4m_gene_ids), "genes )\n")
cat("Shared introgression: Chr4:", introgression_start, "-", introgression_end,
"(", length(shared_introgression_gene_ids), "genes )\n")
cat("Introgressed flanking:", length(flanking_introgression_gene_ids), "genes\n")
}
## Inv4m inversion: Chr4: 172883675 - 188132113 ( 434 genes )
## Shared introgression: Chr4: 157012149 - 195900523 ( 1099 genes )
## Introgressed flanking: 665 genes
The goal of this section is to classify differentially expressed
genes (DEGs) into three tiers of stringency based on significance (FDR)
and effect size (log2FC), as well as a final set of genes
selected for deep-dive analysis.
The effect sizes are interpreted differently based on the predictor:
Leaf Position (leaf predictor):
This was modeled as a continuous slope (\(\beta\)). To be considered a large effect,
the total change across the 3 units (Leaf 1 → Leaf 4) must meet the
\(|\log_2FC|>3/2\) criterion. $ || =
|3| > 3/2 || > 1/2 = 0.5 $ We use \(|\beta| > 0.5\) as the threshold for the
leaf slope and interaction with -P.
Categorical Predictors (-P, Inv4m, Interaction): The standard large effect size threshold of \(|\log_2FC| > 1.5\) is applied directly.
This first tier identifies all genes that pass the statistical significance threshold, regardless of effect size.
{
cat("\nSignificant DEGs (is_DEG, FDR < 0.05) Count:\n")
print(with(effects_df, table(predictor, is_DEG)))
}
##
## Significant DEGs (is_DEG, FDR < 0.05) Count:
## is_DEG
## predictor FALSE TRUE
## Leaf 16072 8177
## -P 17718 6531
## Leaf:-P 21008 3241
## Inv4m 23783 466
## Inv4m:-P 24249 0
## Leaf:Inv4m 24248 1
The second tier filters the significant DEGs further by applying the custom large effect size thresholds.
is_large_effect <- rep(FALSE, nrow(effects_df))
is_leaf <- effects_df$predictor == "Leaf"
is_interaction <- effects_df$predictor == "Leaf:-P" |
effects_df$predictor == "Inv4m:-P" |
effects_df$predictor == "Leaf:Inv4m"
is_large_effect[is_leaf & abs(effects_df$logFC) > 0.5] <- TRUE
is_large_effect[is_interaction & abs(effects_df$logFC) > 0.5] <- TRUE
is_large_effect[!is_leaf & abs(effects_df$logFC) > 1.5] <- TRUE
effects_df <- effects_df %>%
mutate(is_hiconf_DEG = is_DEG & is_large_effect)
{
cat("\nHigh-Confidence DEGs (is_hiconf_DEG) Count:\n")
print(with(effects_df, table(predictor, is_hiconf_DEG)))
print(
with(
effects_df %>% filter(is_hiconf_DEG),
table(predictor, regulation, is_hiconf_DEG)
) %>%
as_tibble() %>%
arrange(desc(predictor), desc(regulation))
)
}
##
## High-Confidence DEGs (is_hiconf_DEG) Count:
## is_hiconf_DEG
## predictor FALSE TRUE
## Leaf 22770 1479
## -P 23545 704
## Leaf:-P 23554 695
## Inv4m 24094 155
## Inv4m:-P 24249 0
## Leaf:Inv4m 24248 1
## # A tibble: 12 × 4
## predictor regulation is_hiconf_DEG n
## <chr> <chr> <chr> <int>
## 1 Leaf:Inv4m Upregulated TRUE 0
## 2 Leaf:Inv4m Downregulated TRUE 1
## 3 Leaf:-P Upregulated TRUE 468
## 4 Leaf:-P Downregulated TRUE 227
## 5 Leaf Upregulated TRUE 716
## 6 Leaf Downregulated TRUE 763
## 7 Inv4m:-P Upregulated TRUE 0
## 8 Inv4m:-P Downregulated TRUE 0
## 9 Inv4m Upregulated TRUE 75
## 10 Inv4m Downregulated TRUE 80
## 11 -P Upregulated TRUE 520
## 12 -P Downregulated TRUE 184
The final tier, is_selected_DEG, selects the most
interesting genes for visualization and detailed annotation.
rank_threshold <- 10
effects_df <- effects_df %>%
group_by(is_hiconf_DEG, predictor, regulation) %>%
mutate(
pval_rank = row_number(dplyr::desc(neglogP)),
mahal_rank = row_number(dplyr::desc(mahalanobis))
) %>%
ungroup() %>%
mutate(
is_selected_DEG = (pval_rank <= rank_threshold & is_hiconf_DEG) |
(mahal_rank <= rank_threshold & is_hiconf_DEG)
)
{
cat(sprintf(
"\nSelected DEGs (is_selected_DEG, Top N=%d by Rank) Count:\n",
rank_threshold
))
with(
effects_df %>% filter(is_selected_DEG & regulation != "Unregulated"),
table(regulation, predictor, is_selected_DEG)
)
}
##
## Selected DEGs (is_selected_DEG, Top N=10 by Rank) Count:
## , , is_selected_DEG = TRUE
##
## predictor
## regulation Leaf -P Leaf:-P Inv4m Inv4m:-P Leaf:Inv4m
## Downregulated 19 16 13 11 0 1
## Upregulated 19 15 15 14 0 0
# Join gene annotations (locus_name, desc come from gene_pannzer)
effects_df <- effects_df %>%
left_join(
gene_pannzer,
by = c(gene = "gene_model"),
relationship = "many-to-many"
) %>%
mutate(desc_merged = coalesce(locus_name, desc)) %>%
dplyr::select(predictor, regulation, gene, locus_symbol, desc_merged, everything()) %>%
inner_join(
genes %>%
dplyr::select(gene = ID, CHR = seqnames, BP = start) %>%
mutate(CHR = as.character(CHR) %>% as.integer()),
by = "gene"
)
# Make locus_symbol the default locus_label
# Remove symbols corresponding to DNA markers in the consensus map
effects_df <- effects_df %>%
mutate(
locus_label = case_when(
is.na(locus_symbol) ~ NA_character_,
grepl("^si\\d*[a-h]", locus_symbol) ~ NA_character_,
grepl("^umc", locus_symbol) ~ NA_character_,
grepl("^csu", locus_symbol) ~ NA_character_,
grepl("^bnlg", locus_symbol) ~ NA_character_,
grepl("^php\\d\\d", locus_symbol) ~ NA_character_,
grepl("^csu\\d+\a", locus_symbol) ~ NA_character_,
grepl("^pco", locus_symbol) ~ NA_character_,
grepl("^IDP", locus_symbol) ~ NA_character_,
grepl("^TIDP\\d{4}", locus_symbol) ~ NA_character_,
grepl("^cl\\d*_\\d", locus_symbol) ~ NA_character_,
grepl("^cl\\d*_-", locus_symbol) ~ NA_character_,
grepl("^Zm00001eb", locus_symbol) ~ NA_character_,
grepl("^Zm00001d", locus_symbol) ~ NA_character_,
grepl("^GRM", locus_symbol) ~ NA_character_,
grepl("LOC\\d{4}", locus_symbol) ~ NA_character_,
TRUE ~ locus_symbol
)
)
{
cat("\nAnnotations added:\n")
cat(" Final columns:", ncol(effects_df), "\n")
cat(" Genes with coordinates:\n")
print(with(effects_df, table(predictor, !is.na(effects_df$CHR))))
}
##
## Annotations added:
## Final columns: 27
## Genes with coordinates:
##
## predictor TRUE
## Leaf 24011
## -P 24011
## Leaf:-P 24011
## Inv4m 24011
## Inv4m:-P 24011
## Leaf:Inv4m 24011
# Add curated locus labels
curated <- read.csv(file.path(paths$data, "selected_DEGs_curated_locus_label_2.csv")) %>%
dplyr::select(gene, locus_label, desc_merged) %>%
group_by(gene) %>%
dplyr::slice(1) %>%
ungroup()
# Coalesce curated locus_label into effects_df
effects_df <- effects_df %>%
left_join(curated, by = "gene", suffix = c("", "_curated")) %>%
mutate(desc_merged = ifelse(desc_merged == "", NA, desc_merged)) %>%
mutate(
locus_label = coalesce(locus_label_curated, locus_label),
desc_merged = coalesce(desc_merged_curated, desc_merged)
) %>%
dplyr::select(-locus_label_curated, -desc_merged_curated)
Classify genes by genomic location relative to Inv4m.
#' Count unique genes in a specified genomic region
#'
#' @param effects_df Data frame containing gene annotations and region indicators
#' @param region Character string specifying the region column name
#'
#' @return Integer count of unique genes in the specified region
count_genes_in_region <- function(effects_df, region) {
stopifnot(
is.data.frame(effects_df),
is.character(region),
region %in% names(effects_df),
"gene" %in% names(effects_df)
)
effects_df %>%
filter(.data[[region]]) %>%
distinct(gene) %>%
nrow()
}
effects_df <- effects_df %>%
mutate(
in_Inv4m = gene %in% inv4m_gene_ids,
in_cis = gene %in% shared_introgression_gene_ids,
in_flank = gene %in% flanking_introgression_gene_ids,
in_trans = !in_cis
)
regions <- c("in_Inv4m", "in_cis", "in_flank", "in_trans")
{
cat("\nRegion classification:\n")
print(sapply(regions, function(r) count_genes_in_region(effects_df, r)))
}
##
## Region classification:
## in_Inv4m in_cis in_flank in_trans
## 253 608 355 23403
Test whether DEGs are enriched in specific genomic regions using Fisher’s exact test.
#' Run Fisher's exact test for regional enrichment
#'
#' @param data Data frame with region and DEG indicators
#' @param region_col Column name for region indicator
#' @param deg_col Column name for DEG indicator
#'
#' @return Tibble with enrichment statistics
run_fisher <- function(data, region_col, deg_col) {
contingency <- table(data[[region_col]], data[[deg_col]])
fisher_result <- fisher.test(contingency)
tibble(
in_region_DEG = contingency[2, 2],
in_region_total = sum(contingency[2, ]),
outside_DEG = contingency[1, 2],
outside_total = sum(contingency[1, ]),
odds_ratio = fisher_result$estimate,
p_value = fisher_result$p.value,
enrichment = (contingency[2, 2] / sum(contingency[2, ])) /
(contingency[1, 2] / sum(contingency[1, ]))
)
}
# Create separate data frame for Inv4m predictor
Region_effects <- effects_df %>%
filter(predictor == "Inv4m") %>%
mutate(outside = in_trans)
# All DEG enrichment tests
deg_tests <- bind_rows(
run_fisher(Region_effects, "in_cis", "is_DEG") %>%
mutate(comparison = "Shared introgression vs Outside"),
run_fisher(
Region_effects %>% filter(in_flank | outside),
"in_flank",
"is_DEG"
) %>%
mutate(comparison = "Flanking vs Outside"),
run_fisher(
Region_effects %>% filter(in_Inv4m | outside),
"in_Inv4m",
"is_DEG"
) %>%
mutate(comparison = "Inv4m vs Outside"),
run_fisher(
Region_effects %>% filter(in_cis),
"in_Inv4m",
"is_DEG"
) %>%
mutate(comparison = "Inv4m vs Flanking")
) %>%
dplyr::select(comparison, everything())
{
cat("DEG Enrichment (FDR < 0.05):\n")
print(deg_tests)
}
## DEG Enrichment (FDR < 0.05):
## # A tibble: 4 × 8
## comparison in_region_DEG in_region_total outside_DEG outside_total odds_ratio
## <chr> <int> <int> <int> <int> <dbl>
## 1 Shared int… 272 608 193 23403 97.1
## 2 Flanking v… 158 355 193 23403 96.5
## 3 Inv4m vs O… 114 253 193 23403 98.7
## 4 Inv4m vs F… 114 253 158 355 1.02
## # ℹ 2 more variables: p_value <dbl>, enrichment <dbl>
hiconf_deg_tests <- bind_rows(
run_fisher(Region_effects, "in_cis", "is_hiconf_DEG") %>%
mutate(comparison = "Shared introgression vs Outside"),
run_fisher(
Region_effects %>% filter(in_flank | outside),
"in_flank",
"is_hiconf_DEG"
) %>%
mutate(comparison = "Flanking vs Outside"),
run_fisher(
Region_effects %>% filter(in_Inv4m | outside),
"in_Inv4m",
"is_hiconf_DEG"
) %>%
mutate(comparison = "Inv4m vs Outside"),
run_fisher(
Region_effects %>% filter(in_cis),
"in_Inv4m",
"is_hiconf_DEG"
) %>%
mutate(comparison = "Inv4m vs Flanking")
) %>%
dplyr::select(comparison, everything())
{
cat("\nHigh-confidence DEG Enrichment (FDR < 0.05, |logFC| > 1.5):\n")
print(hiconf_deg_tests)
}
##
## High-confidence DEG Enrichment (FDR < 0.05, |logFC| > 1.5):
## # A tibble: 4 × 8
## comparison in_region_DEG in_region_total outside_DEG outside_total odds_ratio
## <chr> <int> <int> <int> <int> <dbl>
## 1 Shared int… 116 608 39 23403 141.
## 2 Flanking v… 76 355 39 23403 163.
## 3 Inv4m vs O… 40 253 39 23403 112.
## 4 Inv4m vs F… 40 253 76 355 0.690
## # ℹ 2 more variables: p_value <dbl>, enrichment <dbl>
{
cat("\n=== Key Findings ===\n")
cat("1. DEGs are enriched in Inv4m region vs genome-wide\n")
cat("2. Both Inv4m and flanking show enrichment vs outside\n")
cat("3. Regional enrichment pattern observed for high-confidence DEGs\n")
cat("\nInterpretation: The introgression region (inversion + flanking)\n")
cat("shows elevated differential expression patterns.\n")
}
##
## === Key Findings ===
## 1. DEGs are enriched in Inv4m region vs genome-wide
## 2. Both Inv4m and flanking show enrichment vs outside
## 3. Regional enrichment pattern observed for high-confidence DEGs
##
## Interpretation: The introgression region (inversion + flanking)
## shows elevated differential expression patterns.
Showing top differentially expressed genes by adjusted p-value.
top_degs_qc <- effects_df %>%
filter(is_DEG, regulation != "Unregulated") %>%
group_by(predictor, regulation) %>%
arrange(-neglogP, .by_group = TRUE) %>%
dplyr::slice(1:10) %>%
dplyr::select(
predictor, gene, locus_symbol,
desc_merged, logFC, neglogP
) %>%
arrange(predictor, regulation, -neglogP)
{
cat("\nTop 10 DEGs per predictor and regulation:\n")
print(top_degs_qc)
}
##
## Top 10 DEGs per predictor and regulation:
## # A tibble: 81 × 7
## # Groups: predictor, regulation [9]
## regulation predictor gene locus_symbol desc_merged logFC neglogP
## <chr> <fct> <chr> <chr> <chr> <dbl> <dbl>
## 1 Downregulated Leaf Zm00001eb152… umc1690 Transcript… -1.48 7.65
## 2 Downregulated Leaf Zm00001eb151… <NA> NTF2 domai… -1.15 7.65
## 3 Downregulated Leaf Zm00001eb076… <NA> Protein ST… -0.949 7.65
## 4 Downregulated Leaf Zm00001eb329… <NA> Tyrosine-s… -0.631 7.65
## 5 Downregulated Leaf Zm00001eb182… <NA> protein MA… -0.607 7.65
## 6 Downregulated Leaf Zm00001eb057… zmm4 Zea mays M… -3.43 7.59
## 7 Downregulated Leaf Zm00001eb176… <NA> photosynth… -0.524 7.59
## 8 Downregulated Leaf Zm00001eb391… <NA> Short-chai… -0.601 7.48
## 9 Downregulated Leaf Zm00001eb284… <NA> CRAL-TRIO … -1.38 7.32
## 10 Downregulated Leaf Zm00001eb294… <NA> Non-specif… -1.26 7.32
## # ℹ 71 more rows
Most extreme differentially expressed genes.
top_outliers_qc <- effects_df %>%
filter(is_mh_outlier) %>%
group_by(predictor, regulation) %>%
arrange(-mahalanobis, .by_group = TRUE) %>%
dplyr::slice(1:10) %>%
dplyr::select(
predictor, regulation, gene,
locus_symbol, desc_merged,
logFC, neglogP, mahalanobis
) %>%
arrange(-neglogP)
{
cat("\nTop Mahalanobis outliers per predictor and regulation:\n")
print(top_outliers_qc)
}
##
## Top Mahalanobis outliers per predictor and regulation:
## # A tibble: 81 × 8
## # Groups: predictor, regulation [9]
## predictor regulation gene locus_symbol desc_merged logFC neglogP mahalanobis
## <fct> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Inv4m Downregul… Zm00… jmj2 JUMONJI-tr… -3.49 21.9 36226.
## 2 Inv4m Downregul… Zm00… jmj4 JUMONJI-tr… -2.84 19.7 29490.
## 3 Inv4m Downregul… Zm00… <NA> Methionyl-… -2.86 17.5 23077.
## 4 Inv4m Downregul… Zm00… <NA> SUMO-activ… -1.98 17.5 23602.
## 5 Inv4m Upregulat… Zm00… <NA> <NA> 6.50 16.9 31320.
## 6 Inv4m Upregulat… Zm00… <NA> Exocyst co… 2.87 16.5 25470.
## 7 Inv4m Downregul… Zm00… IDP4515 DnaJ-like … -1.50 15.9 19735.
## 8 Inv4m Downregul… Zm00… <NA> CCT-alpha -1.69 15.8 19281.
## 9 Inv4m Downregul… Zm00… cl19648_2a ADP-ribosy… -2.78 15.8 18568.
## 10 Inv4m Upregulat… Zm00… cl19799_1 <NA> 1.97 15.6 22010.
## # ℹ 71 more rows
# Overall DEG counts by predictor
overall_summary <- effects_df %>%
group_by(predictor) %>%
summarise(
total_genes = n(),
n_significant = sum(is_DEG),
n_DEG = sum(is_DEG),
n_hiconf_DEG = sum(is_hiconf_DEG),
n_selected_DEG = sum(is_selected_DEG),
pct_DEG = round(100 * n_DEG / total_genes, 2)
)
# Region distribution for Inv4m effect
inv4m_region_summary <- effects_df %>%
filter(predictor == "Inv4m", is_DEG) %>%
group_by(regulation, in_Inv4m, in_cis) %>%
summarise(n = n(), .groups = "drop")
{
cat("\n=== DEG Summary Statistics ===\n")
print(overall_summary)
cat("\nInv4m DEGs by region and regulation:\n")
print(inv4m_region_summary)
}
##
## === DEG Summary Statistics ===
## # A tibble: 6 × 7
## predictor total_genes n_significant n_DEG n_hiconf_DEG n_selected_DEG pct_DEG
## <fct> <int> <int> <int> <int> <int> <dbl>
## 1 Leaf 24011 8159 8159 1469 36 34.0
## 2 -P 24011 6515 6515 704 31 27.1
## 3 Leaf:-P 24011 3221 3221 685 25 13.4
## 4 Inv4m 24011 465 465 155 25 1.94
## 5 Inv4m:-P 24011 0 0 0 0 0
## 6 Leaf:Inv4m 24011 1 1 1 1 0
##
## Inv4m DEGs by region and regulation:
## # A tibble: 6 × 4
## regulation in_Inv4m in_cis n
## <chr> <lgl> <lgl> <int>
## 1 Downregulated FALSE FALSE 109
## 2 Downregulated FALSE TRUE 94
## 3 Downregulated TRUE TRUE 74
## 4 Upregulated FALSE FALSE 84
## 5 Upregulated FALSE TRUE 64
## 6 Upregulated TRUE TRUE 40
Extract selected DEGs for detailed presentation in manuscript tables.
selected_degs <- effects_df %>%
filter(is_selected_DEG) %>%
dplyr::select(
predictor,
regulation,
gene,
locus_symbol,
locus_label,
desc_merged,
std_err,
logFC,
neglogP,
mahalanobis,
pval_rank,
mahal_rank
) %>%
arrange(predictor, regulation, -neglogP)
{
cat("\n=== Selected DEGs for Manuscript ===\n")
cat("Total selected DEGs:", nrow(selected_degs), "\n")
cat("Counts by predictor and regulation:\n")
print(with(selected_degs, table(predictor, regulation)))
}
##
## === Selected DEGs for Manuscript ===
## Total selected DEGs: 118
## Counts by predictor and regulation:
## regulation
## predictor Downregulated Upregulated
## Leaf 18 18
## -P 16 15
## Leaf:-P 13 12
## Inv4m 11 14
## Inv4m:-P 0 0
## Leaf:Inv4m 1 0
Export selected DEGs specific to the phosphorus effect with pannzer description.
p_selected <- selected_degs %>%
mutate(
regulation = factor(
regulation,
levels = c("Upregulated", "Downregulated")
)
) %>%
filter(predictor == "-P") %>%
arrange(regulation, -neglogP)
{
cat("\nPhosphorus effect selected DEGs:\n")
cat(" Upregulated:", sum(p_selected$regulation == "Upregulated"), "\n")
cat(" Downregulated:", sum(p_selected$regulation == "Downregulated"), "\n")
print(p_selected)
}
##
## Phosphorus effect selected DEGs:
## Upregulated: 15
## Downregulated: 16
## # A tibble: 31 × 12
## predictor regulation gene locus_symbol locus_label desc_merged std_err logFC
## <fct> <fct> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 -P Upregulat… Zm00… pilncr1 pilncr1 pi-deficie… 0.624 7.33
## 2 -P Upregulat… Zm00… pap2 pap2 purple aci… 0.373 4.33
## 3 -P Upregulat… Zm00… ips1 ips1 induced by… 0.588 6.58
## 4 -P Upregulat… Zm00… gpx1 gpx1 glyceropho… 0.566 6.25
## 5 -P Upregulat… Zm00… GRMZM2G1355… rfk1 Riboflavin… 0.329 3.62
## 6 -P Upregulat… Zm00… <NA> <NA> Mannose-1-… 0.193 2.15
## 7 -P Upregulat… Zm00… pap19 pap19 purple aci… 0.527 5.75
## 8 -P Upregulat… Zm00… pco107612 <NA> Inorganic … 0.256 2.80
## 9 -P Upregulat… Zm00… pfk1 pfk1 phosphofru… 0.214 2.32
## 10 -P Upregulat… Zm00… <NA> <NA> FLZ-type d… 0.263 2.78
## # ℹ 21 more rows
## # ℹ 4 more variables: neglogP <dbl>, mahalanobis <dbl>, pval_rank <int>,
## # mahal_rank <int>
leaf_selected <- selected_degs %>%
mutate(
regulation = factor(
regulation,
levels = c("Upregulated", "Downregulated")
)
) %>%
filter(predictor == "Leaf") %>%
arrange(regulation, -neglogP)
{
cat("\nLeaf stage effect selected DEGs:\n")
cat(" Upregulated:", sum(leaf_selected$regulation == "Upregulated"), "\n")
cat(" Downregulated:", sum(leaf_selected$regulation == "Downregulated"), "\n")
print(leaf_selected)
}
##
## Leaf stage effect selected DEGs:
## Upregulated: 18
## Downregulated: 18
## # A tibble: 36 × 12
## predictor regulation gene locus_symbol locus_label desc_merged std_err logFC
## <fct> <fct> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Leaf Upregulat… Zm00… hir3 hir3 hypersensi… 0.0834 0.805
## 2 Leaf Upregulat… Zm00… IDP641 <NA> Glycosyltr… 0.117 1.10
## 3 Leaf Upregulat… Zm00… bhlh145 bhlh145 bHLH-trans… 0.100 0.900
## 4 Leaf Upregulat… Zm00… cyp6 cyp6 cytochrome… 0.0996 0.899
## 5 Leaf Upregulat… Zm00… trpp2 trpp2 trehalose-… 0.180 1.52
## 6 Leaf Upregulat… Zm00… umc2170 <NA> DNAJ heat … 0.0770 0.648
## 7 Leaf Upregulat… Zm00… salt1 salt1 SalT homol… 0.308 2.55
## 8 Leaf Upregulat… Zm00… <NA> <NA> <NA> 0.0937 0.736
## 9 Leaf Upregulat… Zm00… <NA> <NA> Surfactant… 0.0687 0.529
## 10 Leaf Upregulat… Zm00… wrky111 wrky111 WRKY-trans… 0.0796 0.604
## # ℹ 26 more rows
## # ℹ 4 more variables: neglogP <dbl>, mahalanobis <dbl>, pval_rank <int>,
## # mahal_rank <int>
Export selected DEGs specific to the leaf:treatment interaction.
interaction_selected <- selected_degs %>%
mutate(
regulation = factor(
regulation,
levels = c("Upregulated", "Downregulated")
)
) %>%
group_by(regulation) %>%
filter(predictor == "Leaf:-P") %>%
arrange(regulation, -neglogP)
{
cat("\nLeaf:Treatment interaction selected DEGs:\n")
cat(" Upregulated:", sum(interaction_selected$regulation == "Upregulated"), "\n")
cat(" Downregulated:",
sum(interaction_selected$regulation == "Downregulated"), "\n")
print(interaction_selected)
}
##
## Leaf:Treatment interaction selected DEGs:
## Upregulated: 12
## Downregulated: 13
## # A tibble: 25 × 12
## # Groups: regulation [2]
## predictor regulation gene locus_symbol locus_label desc_merged std_err logFC
## <fct> <fct> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Leaf:-P Upregulat… Zm00… pco139502b <NA> Pyruvate k… 0.152 1.26
## 2 Leaf:-P Upregulat… Zm00… mrpa3 mrpa3 multidrug … 0.0822 0.669
## 3 Leaf:-P Upregulat… Zm00… plc6 plc6 phospholip… 0.0812 0.580
## 4 Leaf:-P Upregulat… Zm00… <NA> <NA> Ribonuclea… 0.0903 0.641
## 5 Leaf:-P Upregulat… Zm00… bcd386a.2 bcd386a.2 RING/U-box… 0.0720 0.512
## 6 Leaf:-P Upregulat… Zm00… pld16 pld16 phospholip… 0.0836 0.588
## 7 Leaf:-P Upregulat… Zm00… <NA> <NA> Beta-galac… 0.0809 0.541
## 8 Leaf:-P Upregulat… Zm00… <NA> <NA> PI-PLC X d… 0.183 1.22
## 9 Leaf:-P Upregulat… Zm00… <NA> <NA> Heptahelic… 0.0997 0.658
## 10 Leaf:-P Upregulat… Zm00… gmp1 gmp1 mannose-1-… 0.110 0.708
## # ℹ 15 more rows
## # ℹ 4 more variables: neglogP <dbl>, mahalanobis <dbl>, pval_rank <int>,
## # mahal_rank <int>
# Load normalized expression data
voomR <- readRDS(file.path(paths$intermediate, "normalized_expression_voom_object_leaf_trt.rds"))
# Define genes of interest - High_confidence interaction genes
interaction_highconf <- effects_df %>%
filter(is_hiconf_DEG & predictor == "Leaf:-P")
interaction_genes <- interaction_highconf %>%
pull(gene) %>%
unique()
# Extract expression matrix for genes of interest
expr_matrix <- voomR$E[interaction_genes, ]
# Convert to long format and join with sample metadata
expr_long <- as.data.frame(t(expr_matrix)) %>%
tibble::rownames_to_column("sample") %>%
pivot_longer(
cols = -sample,
names_to = "gene",
values_to = "log2CPM"
) %>%
left_join(
voomR$targets %>%
tibble::rownames_to_column("sample") %>%
dplyr::select(sample, Treatment, leaf_tissue, Genotype),
by = "sample"
) %>%
left_join(
interaction_highconf %>%
filter(predictor == "Leaf:-P") %>%
dplyr::select(gene, regulation) %>%
distinct(),
by = "gene"
)
# Recode treatment levels
expr_long$leafxP <- factor(expr_long$regulation)
levels(expr_long$leafxP) <- c("Negative", "Positive")
# Center expression per gene-treatment combination
expr_long <- expr_long %>%
group_by(gene, Treatment) %>%
mutate(centered_log2CPM = log2CPM - mean(log2CPM)) %>%
ungroup()
# Calculate mean centered expression per gene per leaf stage per treatment
expr_summary <- expr_long %>%
group_by(gene, Treatment, leaf_tissue, leafxP) %>%
summarise(mean_log2CPM = mean(centered_log2CPM), .groups = "drop")
# Create separate datasets for each treatment
expr_minusP <- expr_summary %>% filter(Treatment == "-P")
expr_plusP <- expr_summary %>% filter(Treatment == "+P")
# Create spaghetti plot
base_size <- 30
# Create base plot for +P treatment (plotted first, will be in back)
p_plusP_base <- expr_plusP %>%
ggplot(aes(x = leaf_tissue, y = mean_log2CPM, color = Treatment)) +
geom_line(
aes(group = interaction(gene, Treatment)),
alpha = 0.1,
linewidth = 0.8
) %>%
with_shadow(colour = "black", x_offset = 0, y_offset = 0, sigma = 2) +
geom_smooth(
aes(group = Treatment),
method = "lm",
formula = y ~ x,
se = FALSE,
linewidth = 2
) %>%
with_shadow(colour = "black", x_offset = 0, y_offset = 0, sigma = 2) +
facet_wrap(~leafxP, ncol = 2) +
labs(
x = "Leaf",
y = expression("Centered " * log[2] * "(CPM)")
) +
theme_classic(base_size = base_size) +
theme(
strip.background = element_blank(),
strip.text.x = element_blank(),
plot.title = element_blank(),
axis.title.y = element_text(margin = margin(r = -10)),
plot.margin = margin(5.5, 7, 50, 5.5, "pt")
)
# Overlay -P data on top (plotted second, will be in front)
p_spaghetti <- p_plusP_base +
geom_line(
data = expr_minusP,
aes(
x = leaf_tissue, y = mean_log2CPM, color = Treatment,
group = interaction(gene, Treatment)
),
alpha = 0.1,
linewidth = 0.8
) +
geom_smooth(
data = expr_minusP,
aes(
x = leaf_tissue, y = mean_log2CPM, color = Treatment,
group = Treatment
),
method = "lm",
formula = y ~ x,
se = FALSE,
linewidth = 2
) %>%
with_shadow(colour = "black", x_offset = 0, y_offset = 0, sigma = 2) +
scale_color_manual(
values = c("+P" = "gold", "-P" = "purple4"),
name = "Treatment"
) +
guides(color = guide_legend(
override.aes = list(linewidth = 2, alpha = 1)
)) +
theme(
legend.position = c(0.4, 0.95),
legend.title = element_blank(),
legend.background = element_rect(fill = "transparent"),
legend.box.background = element_rect(fill = "transparent", color = NA)
)
p_spaghetti
# Select top annotated genes per regulation direction
interaction_for_profiles <- interaction_highconf %>%
mutate(regulation = factor(
regulation,
levels = c("Downregulated", "Upregulated")
)) %>%
filter(predictor == "Leaf:-P") %>%
group_by(regulation) %>%
arrange(regulation, -neglogP) %>%
dplyr::slice(1:12) %>%
ungroup() %>%
mutate(locus_label = paste0(
locus_label, " ", gene, "\n",
str_wrap(coalesce(locus_name, desc_merged),
width = 30,
whitespace_only = FALSE
)
)) %>%
mutate(locus_label = gsub("pco139502b |NA ", "", locus_label))
interaction_for_profiles$locus_label <- gsub("^ +", "", interaction_for_profiles$locus_label)
interaction_for_profiles$gene
## [1] "Zm00001eb359280" "Zm00001eb207130" "Zm00001eb389720" "Zm00001eb138650"
## [5] "Zm00001eb101660" "Zm00001eb070520" "Zm00001eb212520" "Zm00001eb179680"
## [9] "Zm00001eb362560" "Zm00001eb004410" "Zm00001eb214780" "Zm00001eb111630"
## [13] "Zm00001eb157810" "Zm00001eb376160" "Zm00001eb063230" "Zm00001eb144680"
## [17] "Zm00001eb044390" "Zm00001eb339870" "Zm00001eb011050" "Zm00001eb393060"
## [21] "Zm00001eb009430" "Zm00001eb148030" "Zm00001eb289800" "Zm00001eb297970"
# Filter and annotate existing expr_long
expr_plot <- expr_long %>%
inner_join(interaction_for_profiles) %>%
group_by(regulation) %>%
mutate(locus_label = forcats::fct_reorder(locus_label, -neglogP))
# Define gene display order (top 2 per direction)
gene_order <- interaction_for_profiles %>%
group_by(regulation) %>%
arrange(regulation, -neglogP) %>%
dplyr::slice(1:2) %>%
pull(gene)
gene_order
## [1] "Zm00001eb359280" "Zm00001eb207130" "Zm00001eb157810" "Zm00001eb376160"
p_gene <- expr_plot %>%
filter(gene %in% gene_order) %>%
group_by(regulation) %>%
arrange(regulation, -neglogP) %>%
ungroup() %>%
ggplot(aes(
x = leaf_tissue, y = log2CPM,
color = Treatment,
fill = Treatment,
group = Treatment,
shape = Treatment
)) +
stat_summary(fun = mean, geom = "line", linewidth = 1) +
stat_summary(fun = mean, geom = "point", size = 5) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, linewidth = 1.5) +
scale_shape_manual(values = c(19, 25)) +
scale_fill_manual(
values = c("+P" = "gold", "-P" = "purple4"),
name = "Treatment"
) +
scale_color_manual(
values = c("+P" = "gold", "-P" = "purple4"),
name = "Treatment"
) +
guides(
color = "none",
shape = guide_legend(
override.aes = list(color = c("gold", "purple4"))
)
) +
facet_wrap(~locus_label, scales = "free_y", ncol = 2, dir = "v") +
labs(
x = "Leaf",
y = expression(log[2] * "(CPM)")
) +
theme_classic(base_size = base_size) +
theme(
strip.background = element_blank(),
strip.text = element_text(size = 16, face = "bold", hjust = 0),
plot.margin = margin(-53, 5.5, 5.5, 5.5, "pt"),
legend.background = element_rect(fill = "transparent"),
legend.box.background = element_rect(fill = "transparent", color = NA),
legend.position = c(0.075, 0.65),
legend.title = element_blank()
)
p_gene
plot_genes <- interaction_for_profiles %>%
group_by(regulation) %>%
dplyr::select(predictor:gene, locus_label, desc_merged, neglogP) %>%
distinct() %>%
arrange(regulation, -neglogP) %>%
dplyr::slice(1:10) %>%
pull(gene)
p_interaction <- expr_plot %>%
filter(gene %in% plot_genes) %>%
group_by(regulation) %>%
arrange(regulation, -neglogP) %>%
ggplot(aes(
x = leaf_tissue, y = log2CPM,
color = Treatment, group = Treatment
)) +
stat_summary(fun = mean, geom = "line", linewidth = 1) +
stat_summary(fun = mean, geom = "point", size = 5) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, linewidth = 1.5) +
scale_color_manual(
values = c("+P" = "gold", "-P" = "purple4"),
name = "Treatment"
) +
guides(color = guide_legend(reverse = TRUE)) +
facet_wrap(~locus_label,
scales = "free_y",
ncol = 5, nrow = 4
) +
labs(
x = "Leaf",
y = expression(log[2] * "(CPM)")
) +
theme_classic(base_size = base_size) +
theme(
strip.background = element_blank(),
strip.text = element_text(size = 15, face = "bold", hjust = 0),
legend.background = element_rect(fill = "transparent"),
legend.box.background = element_rect(fill = "transparent", color = NA),
legend.position = "top",
legend.title = element_blank()
)
p_interaction
# Combine plots
right_panel <- cowplot::plot_grid(
p_spaghetti, p_gene,
ncol = 1,
align = "h",
axis = "lr"
)
ggsave(
file.path(paths$figures, "right_panel.png"),
plot = right_panel,
width = 8,
height = 15,
dpi = 150
)
right_panel
{
cat("Spaghetti plot created with shadows\n")
cat("Total unique genes plotted:", length(plot_genes), "\n")
}
## Spaghetti plot created with shadows
## Total unique genes plotted: 20
# Full effects table with all columns
write.csv(
effects_df,
file = file.path(paths$intermediate, "predictor_effects_leaf_interaction_model.csv"),
row.names = FALSE
)
{
cat("\nFull effects table exported:\n")
cat(" predictor_effects_leaf_interaction_model.csv\n")
}
##
## Full effects table exported:
## predictor_effects_leaf_interaction_model.csv
# Selected DEGs for manuscript
write.csv(
selected_degs,
file = file.path(paths$intermediate, "selected_DEGs_leaf_interaction_model.csv"),
row.names = FALSE
)
# Phosphorus selected DEGs
write.csv(
p_selected,
file = file.path(paths$intermediate, "p_selected_DEGs_leaf_interaction_model.csv"),
row.names = FALSE
)
# Leaf selected DEGs
write.csv(
leaf_selected,
file = file.path(paths$intermediate, "leaf_selected_DEGs_leaf_interaction_model.csv"),
row.names = FALSE
)
# Inv4m selected DEGs
write.csv(
selected_degs %>% filter(predictor == "Inv4m"),
file = file.path(paths$intermediate, "inv4m_selected_DEGs_leaf_interaction_model.csv"),
row.names = FALSE
)
# Leaf:Treatment interaction selected DEGs
write.csv(
interaction_selected,
file = file.path(paths$intermediate, "leafxP_selected_DEGs_leaf_interaction_model.csv"),
row.names = FALSE
)
{
cat("\nSelected DEG tables exported:\n")
cat(" selected_DEGs_leaf_interaction_model.csv - All selected DEGs\n")
cat(" p_selected_DEGs_leaf_interaction_model.csv - Phosphorus effect\n")
cat(" leaf_selected_DEGs_leaf_interaction_model.csv - Leaf effect\n")
cat(" inv4m_selected_DEGs_leaf_interaction_model.csv - Inv4m effect\n")
cat(" leafxP_selected_DEGs_leaf_interaction_model.csv - Leaf:Treatment interaction\n")
}
##
## Selected DEG tables exported:
## selected_DEGs_leaf_interaction_model.csv - All selected DEGs
## p_selected_DEGs_leaf_interaction_model.csv - Phosphorus effect
## leaf_selected_DEGs_leaf_interaction_model.csv - Leaf effect
## inv4m_selected_DEGs_leaf_interaction_model.csv - Inv4m effect
## leafxP_selected_DEGs_leaf_interaction_model.csv - Leaf:Treatment interaction
This analysis identified differentially expressed genes across four main effects:
All results include:
The full effects table contains 144066 gene × predictor combinations.
sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.6.1
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] robustbase_0.99-6 ggtext_0.1.2 ggfx_1.0.3
## [4] ggpubr_0.6.2 ggplot2_4.0.1 stringr_1.6.0
## [7] tidyr_1.3.1 dplyr_1.1.4 rtracklayer_1.70.0
## [10] GenomicRanges_1.62.1 Seqinfo_1.0.0 IRanges_2.44.0
## [13] S4Vectors_0.48.0 BiocGenerics_0.56.0 generics_0.1.4
## [16] edgeR_4.8.1 limma_3.66.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 rlang_1.1.6
## [3] magrittr_2.0.4 matrixStats_1.5.0
## [5] compiler_4.5.2 mgcv_1.9-4
## [7] systemfonts_1.3.1 vctrs_0.6.5
## [9] pkgconfig_2.0.3 crayon_1.5.3
## [11] fastmap_1.2.0 backports_1.5.0
## [13] magick_2.9.0 XVector_0.50.0
## [15] labeling_0.4.3 utf8_1.2.6
## [17] Rsamtools_2.26.0 rmarkdown_2.30
## [19] markdown_2.0 ragg_1.5.0
## [21] purrr_1.2.0 xfun_0.55
## [23] cachem_1.1.0 cigarillo_1.0.0
## [25] litedown_0.9 jsonlite_2.0.0
## [27] DelayedArray_0.36.0 BiocParallel_1.44.0
## [29] broom_1.0.11 parallel_4.5.2
## [31] R6_2.6.1 bslib_0.9.0
## [33] stringi_1.8.7 RColorBrewer_1.1-3
## [35] car_3.1-3 jquerylib_0.1.4
## [37] Rcpp_1.1.0 SummarizedExperiment_1.40.0
## [39] knitr_1.50 Matrix_1.7-4
## [41] splines_4.5.2 tidyselect_1.2.1
## [43] abind_1.4-8 yaml_2.3.12
## [45] codetools_0.2-20 curl_7.0.0
## [47] lattice_0.22-7 tibble_3.3.0
## [49] Biobase_2.70.0 withr_3.0.2
## [51] S7_0.2.1 evaluate_1.0.5
## [53] xml2_1.5.1 Biostrings_2.78.0
## [55] pillar_1.11.1 MatrixGenerics_1.22.0
## [57] carData_3.0-5 rprojroot_2.1.1
## [59] RCurl_1.98-1.17 commonmark_2.0.0
## [61] scales_1.4.0 glue_1.8.0
## [63] tools_4.5.2 BiocIO_1.20.0
## [65] locfit_1.5-9.12 ggsignif_0.6.4
## [67] GenomicAlignments_1.46.0 forcats_1.0.1
## [69] XML_3.99-0.20 cowplot_1.2.0
## [71] grid_4.5.2 nlme_3.1-168
## [73] restfulr_0.0.16 Formula_1.2-5
## [75] cli_3.6.5 textshaping_1.0.4
## [77] S4Arrays_1.10.1 viridisLite_0.4.2
## [79] gtable_0.3.6 DEoptimR_1.1-4
## [81] rstatix_0.7.3 sass_0.4.10
## [83] digest_0.6.39 SparseArray_1.10.6
## [85] rjson_0.2.23 farver_2.1.2
## [87] htmltools_0.5.9 lifecycle_1.0.4
## [89] httr_1.4.7 here_1.0.2
## [91] statmod_1.5.1 gridtext_0.1.5